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Abstract — This paper takes an empirical approach 
to identify operational factors at busy airports that 
may predate go-around maneuvers. Using four years 
of data from San Francisco International Airport, we 
begin our investigation with a statistical approach 
to investigate which features of airborne, ground 
operations (e.g., number of inbound aircraft, number 
of aircraft taxiing from gate, etc.) or weather are 
most likely to fluctuate, relative to nominal opera- 
tions, in the minutes immediately preceding a missed 
approach. We analyze these findings both in terms 
of their implication on current airport operations 
and discuss how the antecedent factors may affect 
NextGen. Finally, as a means to assist air traffic con- 
trollers, we draw upon techniques from the machine 
learning community to develop a preliminary alert 
system for go-around prediction. 

I. Introduction 

A missed approach or go-around (GA) occurs 
when an aircraft aborts its landing and is forced, 
instead, to land on a subsequent approach. It is 
tempting to speculate that lack of visibility at deci- 
sion height or pilot error, that is a pilot's inability 
to safely land the aircraft in a given situation, are 
a leading cause of GAs. Indeed, low ceiling and 
visibility increase the potential for missed-approach 
and the workload for pilots and controllers JT]. 
Nevertheless, as it is shown in this paper, weather 
only accounts for a small fraction of the total 
number of go-arounds at SFO. However, interviews 
with air-traffic controllers during field visits to Lo- 
gan International Airport in Boston and Laguardia 
Airport in New York refute these claims; rather, the 
controller's testimony suggests operational errors, 
such as a runway incursion, late runway departure 
from an aircraft taking off or holding position line 
violation are the primary causes of GAs. 

To increase airport throughput, NextGen' s high- 
density operations are projected at more air- 
port's than today's class B airports. High-density 
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operations require high performance procedures and 
aircraft equipage to enable Closely Spaced Parallel 
Approaches with delegated separation procedures. 
Despite the increased automation, the technolo- 
gies, there will always be errors or unexpected 
events leading to missed approach. Avionics for 
de-conflicted missed approaches for converging is 
still in the roadmap of NextGen but has not been 
addressed yet (3). To take full advantage for these 
high throughput operations, the number of missed 
approaches needs to be minimized, and therefore a 
thorough understanding of the factors that lead to 
missed approach is necessary. 

Go-arounds intensify the workload of air traffic 
controllers, as landing sequences must be amended 
to accommodate the aircraft that failed to land E). 
On a related note, the requisite revamping schedules 
taxes an airport system that maintains high-safety 
standards largely through comprehensive planning 
and delegation, go-arounds are also undesirable 
from a safety perspective [5]. Finally, go-arounds 
are costly for airlines, both in terms of the added 
fuel cost and the logistic delays absorbed from 
spending extra time airborne [6]. Finally, as go- 
arounds require personnel to quickly shift to quickly 
invoke 

This report is an exploratory study aimed at ad- 
dressing the following questions i.) can we identify 
factors that lead to GAs and ii.) if so, can these 
causes be identified in real-world data sets from 
major airports to predict GAs? If the factors causing 
GAs can be identified, then mitigation action can be 
taken in order to reduce the number of GAs without 
impacting airport throughput. 

The remainder of this report is structured as 
follows. Section [TT] describes the datasets used to 
construct our corpus of flight data. Section III 
provides a statistical analysis of conditions that may 
be closely associated with GAs and Section IV 
explores the idea of using these findings as a 
means to predict GAs. Conclusions are presented 
in Section IV] 



II. Data presentation and selection 

In an attempt to identify factors that precipitate a 
GA, data was collected from three complimentary 
datasets over a four-year period spanning January 
2006 to December 2009. Following cleaning and 
synchronization, the data was combined to create a 
single master dataset. 

A. Data presentation 

The datasets used to build our corpus of flight 
data are described below and the fields extracted 
from the datasets are presented in Table [I] The 
values of the fields were sampled for every minute 
in time during the four years of study. If a GA 
occurs during occurs within the one minute sample 
window we refer to the sample as a GA sample. 
Otherwise, the sample is referred to as a nominal 
sample. In the event a portion of the data associated 
with a particular sample of interest was absent, the 
sample was removed from the data set in order to 
ensure uniformity among dataset entries. The study 
focused on the periods of higher traffic density, that 
is 7:00 to 23:00 local time. 

Airspace data: This dataset was provided by the 
San Francisco Noise Abatement Office Q. It 
contains data for all of the flights recorded by the 
secondary radar located at Oakland International 
Airport (OAK). In the data set, it appears the range 
of the radar was increased from 45 to 60 NM over 
the four years of interest and to ensure consistency 
over the 4 year period only the data in a radius 
smaller than 45 NM was kept. For each flight, 
the dataset contains the aircraft's 4-D trajectory as 
well as metadata such as flight identification, origin 
airport, destination airport, etc. We kept only flights 
departing or arriving at one of the three largest 
airports in the Bay Area, that is San-Francisco 
International Airport (SFO), Oakland International 
Airport (OAK) and San Jose International Airport 
(SJC). 

Ground data: The ground data was extracted from 
the Aviation System Performance Metric (ASPM) 
flight database. This database contains a record of 
both the scheduled time and actual time for pullback 
from gate, takeoff, and landing for each aircraft at 
each major airport in the United States. This data 
is available for download from the Federal Aviation 
Administration (FAA) website (5). To compliment 




Fig. 1. Simplified SFO diagram and selected runway configu- 
ration 

the airspace fields, the relevant fields for all flights 
taking off or landing at SFO were extracted from 
the ASPM dataset. 

Weather and runway data: The weather infor- 
mation and the runway configuration for SFO was 
extracted from the ASPM Airport database. This 
database is also available on the FAA website lf8l . 
The database includes the weather (visibility, cloud 
ceiling, wind speed and direction) as well as the 
runway configuration in use. Figure [T] depicts the 
layout of SFO. In the configuration illustrated, 
aircraft take off from runways 1R-L (parallel run- 
ways), and land on runway 28R-L (also parallel run- 
ways); this is the configuration in use approximately 
80 percent of the time. Among weather-related 
fields, the temperature was not always available at 
each time instant. In these cases, the temperature 
measurement used was obtained by interpolation 
neighboring entries. 

B. GA detection and selection 

To facilitate our investigation of GAs, we as- 
sembled a corpus of samples, each of which is 
one of two types: i.) samples of the airport state 
during nominal operations in which no GAs oc- 
cur and ii.) samples of the airport state during a 
window in which a GA does occur. The following 
rule was used to label a flight as containing a 
GA: a flight contains a GA if during the plane's 
terminal flight phase, the plane's altitude increases 
for fifteen consecutive measurements following a 
period in which the plane descended for at least ten 
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TABLE I 
List of variables generated 



Fields Name: 'Airspace data" 



1 Time of the day 

2-5 # of ac, SFO inbound, current, average 5, 10, 15 min 

6-8 # of ac, SFO inbound, variation 5, 10, 15 min 

9-12 # of ac, SFO outbound, current, average 5, 10, 15 min 

13-15 ac, SFO outbound, variation 5, 10, 15 min 

16-29 # Same as 2-15 for OAK. 

30-43 # Same as 2-15 for SJC 

44-46 Landing rate at SFO (ac/min) 5, 10, 15 min 

47-49 Time elapsed to land the previous 4, 8, 12 ac at SFO 

50-52 Departure rate at SFO (ac/min) 5, 10, 15 min 

53-55 Time elapsed to takeoff the previous 4, 8, 12 ac 

56-58 Landing rate at OAK (ac/min) 5, 10, 15 min 

59-61 Time elapsed to land previous 4, 8, 12 ac at OAK 

62-64 Departure rate at OAK 5, 10. 15 

65-67 Time elapsed to takeoff the previous 4, 8, 12 ac at OAK 

68-70 Landing rate at SJC (ac/min) 5, 10, 15 min 

71-73 Time elapsed to land the previous 4, 8, 12 ac at SJC 

74-76 Departure rate at SJC (ac/min) 5, 10, 15 min 

77-79 Time elapsed to takeoff the previous 4, 8, 12 ac at SJC 

80-88 # of small, large, heavy, ac landing at SFO 

in past 5, 10, 15 min 

89-97 # of small, large, heavy, ac taking-off from SFO 
in past 5, 10, 15 min 



Fields Name: "Ground data 1 ' 



98-101 # of ac taxiing in, current, average 5, 10, 15 min 

102-104 # of ac taxiing in, variation 5, 10, 15 min 

105-108 # of ac taxiing out, current average 5, 10, 15 min 

109-1 11 # of ac taxiing out, variation 5, 10, 15 min 

112 Total estimated departure delay 

113-117 # of ac out delayed > 0, 10, 20, 30, 45 min 

118 Average delay by aircraft, out 

119 Total estimated delay from schedule, arrivals 
120-124 # of ac delayed in > 0, 10, 20, 30, 45 min 
125 Average delay by aircraft, in 



Fields Name: "Weather data" 



126 1 for visual MC and for instrument MC 

127-129 Ceiling, Visibility, Temperature 

130-133 Wind Angle, windspeed, headwind, crosswind 

134 Number of Runway(s) used for landing 

135 Number of Runway(s) used for take offs 



red cross. Finally, the grey line corresponds to the 
period of at least 70 seconds of climb following the 
GA. The remainder of the trajectory, including the 
eventual landing, is shown in black. 
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(a) Trajectory (b) Vertical profile 

Fig. 2. Sample landing trajectory containing a GA 

Figure [3] shows the number of GA gathered from 
the data, by quarter, starting in 2006. The blue 
corresponds to the corpus of GA selected for this 
study, the yellow corresponds to the GA occurring 
at night (23:00 to 7:00), and in red, the GA occuring 
on other runway configurations. Due to missing 
data, these numbers do not reflect the exact count 
of GA at SFO. 



Missing data: for Nov 06 




consecutive radar measurements. At the sample rate 
at which measurements were taken, this corresponds 
to approximately 70 seconds of continuous increase 
in altitude, following 45 seconds of continuous 
descent. This criterion identified nearly all GAs and 
was discerning enough to exclude the trajectories 
of helicopters and short-haul flights not associated 
with GAs. This method of detecting GAs was 
validated through manual verification on a large 
sample of trajectories. For example, Figure [2] shows 
a sample landing trajectory containing a GA. The 
blue line shows the portion of the trajectory prior to 
the GA. The yellow segment corresponds to the 45 
seconds of descent preceding the GA. The instant 
at which the GA is initiated is indicated with a 



Fig. 3. Distribution of GA over time 

IIL Parameter analysis 

This section presents a statistical analysis of the 
distribution of the different variable features for 
both nominal and GA flights. The nominal samples 
used in this section consist of 120,000 samples that 
were randomly taken from points in time no less 
than 15 minutes in time away from a flight that per- 
forms a GA. The GA corpus contains all 2155 GA 
samples. For all figures presented in this section, the 
values corresponding to the GA flights are shown in 
red, values corresponding to "nominal" flights are 
shown in blue. The following discussion highlights 
operational factors that we found interesting, either 
because the data showed a significant difference 



between the nominal and GA sample distributions, 
or because there was remarkably little difference 
between the two distributions. 

1 ) Time of day: Figure [4] presents the sample 
distributions arranged by time of the day from 7:00 
to 23:00 local time. The nominal distribution is not 
uniform on account of the runway configuration 
used for study and missing data; on some days, data 
for the morning was missing for unknown reasons. 
It appears that there are two peaks where GAs occur 
more frequently: from 9:00 to 14:00 and then again 
from 19:00 to 21:00. In the interim, the frequency 
of GAs achieves a minimum near 15:30. 




10 20 30 

# of ac inbound for SFO 




10 20 

# of ac inbound lor SFO average 1 5 n 




# of ac inbound lor SFO variation 5 ruin 



# of ac inbound for SFO var iation 15 r 



Fig. 5. Analysis of the number of aircraft incoming to SFO 




rates (number of aircraft landing per minute) over 



Fig. 4. Distribution of Time for the nominal observations and 
the GA 



2) Number of aircraft inbound for SFO: Fig- 
ure [5] shows the distribution corresponding to the 
number of airborne flights inbound for SFO present 



in the terminal airspace. Figure 5(a) shows the 



number of aircraft at the time the sample is taken, 



Figure 5(b) shows the average number of aircraft 
during the 15 minutes preceding the sample. Intu- 
itively, these statistics capture a measure of an air 
traffic controller's current and recent activity level, 
respectively. The GA and nominal disttibutions are 
similar, but there is a visible shift in the mean; the 
mean of the GA distribution is approximatively 3 
aircraft larger than that of the nominal distribution. 
Figures |5(c)| and |5(d)| present the distributions for 
nominal and GA flights in the difference between 
the number of aircraft in the system at present and 
the number of aircraft 5 and 15 minutes ago, respec- 
tively. The 5 minute variation does not illustrate a 
significant difference between distributions. The 15 
minute distributions suggests that the distribution 
of GAs is shifted slightly to the right relative to the 
associated nominal distribution. 

3) Landing/takeoff rates at SFO and aircraft 
types: Figure [6] presents an analysis of the landing 



the past 5 and 15 minutes (Figures 6(a) and 6(b) i 



as well as the number of heavy and large aircraft 



landing over the past 5 minutes (Figures 6(c) and 
6(d) I. It appears that a higher rate of landing 
increases the likelihood of a GA, but not in a 
very important manner. Moreover, it appears that 
higher numbers of large and heavy aircraft also 
tend to increase the likelihood of a GA occurring. 
Although omitted here, the number of small aircraft 
landing in the past 5, 10 and 15 minutes displayed 
no significant difference between nominal and GA 
distributions. Also omitted, for similar reasons, are 
the distributions corresponding to the takeoff rate at 
SFO. 
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Fig. 6. Analysis landing/takeoff rates at SFO and aircraft types 



4) Number of aircraft outbound from SFO: 
Figure [7] depicts the distributions associated with 
the number of aircraft outbound from SFO, and the 
variation over 5 minutes. There is no significant 
difference between the data corresponding to GAs 
and nominal flights. We omit the associated average 
and variation plots for 5, 10, and 15 minutes as there 
are no significant differences the nominal and GA 
distributions. The outbound traffic does not appear 
to have a statistical impact on the occurrence of 
GAs. 
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Fig. 7. Analysis of the number of aircraft outbound from SFO 

5) Number of aircraft inbound for OAK - Land- 
ing rate: Figure [8] presents the distribution of the 
number of flights present in the terminal airspace (in 



the air) and inbound for OAK. Figure 8(a) shows the 



number of aircraft at the time the sample is taken. 
Figure |8(b)| shows the average number of aircraft 
during the 15 minutes preceding the time of the 
sample. These measures reflect the current activity 
of the controllers and their workload over the past 
15 minutes. The nominal and GA distributions do 
not differ significantly, meaning the number of 
aircraft inbound for OAK does not appear to have a 
statistical impact on the occurrence of GAs at SFO. 



Figures 8(c) and 8(d) present the difference between 
the number of aircraft simultaneously present at the 
time of the sample and the number in the system 
5 and 15 minutes in the past, respectively. The GA 
distribution is shifted slightly to the right of the 
nominal distribution, suggesting GAs occur more 
frequently when there is an increase in the number 
of aircraft inbound for OAK during the preceding 
minutes. Note that the plots are not centered at 
0, suggesting a correlation between the runway 
configuration used at SFO and changes in the traffic 
volume inbound for OAK. The landing rates at 
OAK over the preceding 5, 10 and 15 minutes 
are not presented, since they do not imply any 
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Analysis of the number of aircraft incoming to OAK 




significant statistical impact on GAs at SFO. 

6) Number of aircraft outbound from OAK - 
Takeoff rate: Distributions in the number of aircraft 
outbound from OAK, the variation in the number 
of aircraft outbound as well as the takeoff rates are 
not presented; they present no significant difference 
between the data corresponding to the GA and 
nominal samples. 

7) Number of aircraft inbound to/outbound from 
SJC - Landing/takeoff rates: In terms of instanta- 
neous or average number of aircraft, the distribu- 
tions associated with the number of aircraft inbound 
to and outbound from SJC do not show significant 
differences. Figure |9]presents the distribution of the 
difference in the current number of inbound aircraft 
and the number of inbound aircraft 5 and 15 minutes 
ago. It appears an increase in the number of aircraft 
inbound for SJC tends to indicate an increase in the 
frequency of GAs at SFO. Note that the plots are not 
centered at 0, suggesting a correlation between the 
runway configuration used at SFO and the changes 
in traffic volume inbound for SJC. The landing and 
takeoff rates at SJC over 5, 10 and 15 minutes 
are not presented; they do not appear to have a 
statistical impact on GAs at SFO. 

8) Number of aircraft on the airport surface, 



inbound: Figure 10 presents the distributions asso- 
ciated with the number of inbound aircraft taxiing 



at SFO. Figure 10(a) shows the number of aircraft 



at the time the sample is taken and Figure 10(b) 
shows the average number of aircraft over the 
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Fig. 9. Analysis of the changes in number of aircraft incoming 
to SJC 



preceding 15 minutes. These measures reflect the 
current congestion at the airport for aircraft taxiing- 
in. The shape of nominal and GA distributions 
are very similar, with the GA distributions slightly 
skewed toward higher aircraft counts. Figures [l0(c)| 
and 10(d) present distributions for the difference be- 
tween the number of aircraft simultaneously present 
at the time of sample and 5 and 15 minutes ago, 
respectively. The plots would suggest that a high 
number of incoming aircraft slightly increases the 
probability of having a GA, but some GAs occur 
when there are only a few incoming aircraft. The 5 
minute variation in the number of inbound aircraft 
is slightly shifted toward the negative numbers for 
the GAs, meaning that GAs are more likely to occur 
when the number of aircraft inbound on the surface 
diminishes. This effect is not visible in the case of 
15 minute variations. 
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Fig. 10. 
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Analysis of the number of aircraft taxiing at SFO, 



9) Number of aircraft on the airport surface, 
outbound: Figure 11 presents the distributions of 
the number of inbound aircraft taxiing at SFO. 
Figure 11(a) shows the number of aircraft at the 
time the sample is taken and Figure |1 l(b)| shows 
the average number of aircraft over the preceding 
15 minutes. These measures reflect the current 
congestion at the airport, for aircraft taxiing-in. The 
shape of the distribution of the GAs and the nominal 
samples are very similar; there are slightly more 
GAs at the higher aircraft counts. It appears that a 
high number of outbound aircraft has an impact on 
the probability of having a GA, but some GAs occur 
when there are only a few incoming aircraft. The 
15 minutes plot suggests that having an average of 
more than 10 aircraft taxiing out has a significant 
impact on GAs. 

The difference between the number of aircraft 
simultaneously present at the time of the sample and 
5, 10 and 15 minutes in the past are not depicted, 
since they do not show any particularly interesting 
results. 
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Fig. 11. 
outbound 



Analysis of the number of aircraft taxiing at SFO, 



10) Delays: Figure 12 shows the distributions 



as a function of the number of aircraft delayed. 



Figure 12(a) shows the distribution of the number of 
inbound aircraft with a delay, taxiing in to the gate. 
Figure |12(b)| presents the distribution for delays 
greater than 20 minutes. It appears that GAs are 
less likely to occur when there are no delayed 
aircraft taxiing to a gate. For delays greater than 
20 minutes, although 40% of the time there are 
no aircraft delayed to this extent, 33% of the GAs 
occur under these conditions, indicating large delays 
may contribute to a GA. 



Figures 12(c) and 12(d) present the same distri- 
butions for the case of aircraft taxiing out. While 
25% of the time there are no more than two aircraft 



with a delay, 15% of the GAs occur under these 
conditions. 



distributions as a function of temperature. The plots 
suggest GAs are more likely to occur at "higher" 
temperatures. 
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Fig. 1 2. Analysis of the number of aircraft delayed at SFO 

11) Weather: Figure [T3] shows the distribution 
of of nominal and GA flights as a function of 



various weather parameters. Figure 13(a) presents 
the frequency of the samples as a function of the 
visibility. A visibility of 10 nmi indicates that the 
actual visibility was at least 10 nmi. It appears the 
visibility at SFO is greater than 10 nmi approxima- 
tively 83% of the time, but only 75% of go arounds 
occur during these conditions. GAs appear to occur 
at a greater rate during low visibility conditions 
with 25% of GAs occurring in 17% of the time in 
which visibility is lower. Adverse weather condition 
significantly increase the probability of GA, only 
25 % of GA occur during poor weather condi- 



tions. Figure 13(b) presents the nominal and GA 
distributions as a function of headwind. A negative 
headwind corresponds to a tailwind. Most of the 
flights land with positive to no headwind, and the 
headwind does not appear to be a significant cause 
of GAs. In negative headwinds, that is tailwinds, 
GAs appear to be more frequent. The crosswind, is 
not depicted but does not seem to have an impact 
on GAs. The data suggests crosswinds are not a 



leading cause of GAs. Figure 13(c) presents the 
altitude of the sky's ceiling. All ceiling altitudes 
over 10,000 ft were trimmed to 10,000 ft. The data 
suggests a low ceiling is associated with an increase 
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in the likelihood of a GA. Figure 13(d) shows the 



Fig. 13. Analysis of the weather related parameters 

A. Discussion of results 

From this analysis, three main factors leading to 
an increased probability of a GA are the weather, 
the airborne traffic density and aircraft mix, and 
finally, the ground traffic and its delays. 

1) Weather:: When the visibility or the ceiling 
are low, the rate of GAs is much higher than in 
good weather conditions. A likely explanation is 
the lack of visibility at decision hight forcing the 
pilot to initiate a missed approach and return. Wind, 
including tailwind and crosswind do not appear to 
have a significant impact on the probability of GAs 
to occur. The weather has a direct impact on GAs, 
but at SFO, the number of GAs due to poor weather 
conditions is only 25 %. 

2) Traffic density and aircraft mix:: The analysis 
suggests that having a large number of incoming air- 
craft increases the probability of having a GA. From 
a human factor's perspective, a large number of air- 
craft simultaneously present in the terminal airspace 
increases the workload of the controllers, probably 
leading to more "operational errors" and violation 
of landing minimum separation distance. The termi- 
nal airspace is rather small and congested, therefore 
dealing with many aircraft becomes quickly very 
complex and lead air traffic controllers to vector 



and reroute aircraft j9). In a previous study iflOl . 
it was shown that limiting the number of aircraft 
simultaneously present in the TRACON tends to 
allow for more direct routes, hence reducing the per- 
ceived complexity, and eventually, maybe reducing 
the probability of a GA. However, there are some 
GAs that occur when there are only a few incoming 
aircraft, perhaps a testament to the sporadic and 
haphazard nature of the event. 

The aircraft mix appears to have an effect on the 
likelihood of GA. A high number of large aircraft 
and heavy aircraft landing in the past 5 to 15 
minutes increases the probability of a GA. Possible 
explanations include a separation distance too small 
between aircraft. 

There is also evidence to suggest the variation in 
the number of planes being metered to OAK has 
an impact on the likelihood of a GA; a positive 
change in the number of aircraft incoming into 
OAK seems to increase the probability of a GA 
occurring at SFO. A possible explanation is again 
from a human factor's perspective for the controllers 
in charge of the sequencing and merging for SFO, 
OAK and SJC. Since most of the traffic in the 
TRACON is directed to SFO, a sudden increase in 
the number of aircraft inbound for OAK or SJC 
requires a shift of attention from the controller, 
probably breaking his current mental model ifTTl 
of the situation. This analysis shows the coupling 
effect between the airport, not only because of the 
traffic that need to be separated, but also from a 
controller's point of view. 

3) Ground traffic and delays: It appears that a 
large number of aircraft taxiing out at SFO increases 
the probability of a GA occurring. In addition, an 
average over 15 minutes of more thanlO aircraft 
taxiing out has a visible impact. Note also that 
delays affecting either inbound or outbound aircraft 
increase the probability of GAs. It appears that 
human errors such as runway incursion, holding 
lines violation or late takeoff from runway are more 
likely to occur during high density outbound ground 
traffic and when delays affect ground traffic. 

IV. An Alerting System for Go-Arounds 

In this section, we present a system to evaluate 
the potential of a GA. The first step is to classify 
GAs from nominal samples using the available data. 
The second step is to evaluate the potential of 



a GA at each time step. We first introduce the 
issues related to predicting these rare and poorly 
separated events, before presenting the results of 
our classification and prediction results based on 
the method of linear discriminant analysis. 

A. Classification and Prediction issues 

There are a number of factors that make classi- 
fication and temporal prediction of GAs difficult. 
First, GAs are a rare occurrence. Although this is 
auspicious form an air transportation perspective, 
it makes learning difficult and introduces a strong 
statistical bias in our corpus of training examples. 
To improve learning, it is natural to use a modified 
training set with roughly an equal number of nom- 
inal and GA samples. This can be accomplished 
by either withholding a large portion of nominal 
examples from the corpus, up-sampling the GAs, or 
a combination of the two methods. Unfortunately, 
these methods fail to address the more deep-seeded 
issue that GAs occur during standard phases of 
operation and therefore GA samples will generally 
have features closely aligned with nominal samples. 
In other words, nominal and GA samples are very 
poorly separated in the sample space. As false 
positives (nominal samples that are labelled as GA 
samples) are especially undesirable in our context, 
we must seek to learn relationships that separate 
the training data. Because of the poor separability 
of the samples, it is not possible to predict GAs and 
maintain a low rate of false positives. Therefore, we 
aim at evaluating "high risk" time samples which 
have a higher probability of having a GA. These 
high risk time samples will be denoted by an "alert 
level". The objective is to maximize the number of 
GAs positively identified during the alert level while 
minimizing the total number of samples in the alert 
level. 

B. Linear discriminant analysis 

Linear discriminant analysis is a statistical 
method commonly used to separate samples into 
several classes lfT2l . In our case, we are concerned 
with only two classes of samples: nominal state 
vector samples with label y = and GA state 
samples with label y = 1. We will assume GAs and 
nominal samples are generated according to a two- 
label Gaussian mixture model. For this purpose, our 
corpus of samples, {xi}f =l , Xf £ E 135 is split into 



two groups, a training group and a test group. A 
strong assumption made by LDA is that the condi- 
tional probability density functions p(x\y = 0) and 
p(x\y= 1) are both normally distributed with mean 
and co variance (/jo,En) and (/ji,Ei), respectively. 
Then, a feature vector x is assigned the label if it 
satisfies 

(x -/jq) t I.^ {x -/j ) +ln|E y=0 |- 

(jc-/ii) r l£ 1 (je-/ii)-]n|i^=i|<r, 

(1) 

where T is a threshold value that reflects the fre- 
quency of a label and the |E| denotes the determi- 
nant of E. Otherwise, x is assigned a value of 1. By 
sweeping T downward from a very large number 
to zero, we can progressively decrease the rate at 
which labels are assigned. That is, we can control 
the number of GAs that we are able to identify 
correctly. However, correctly identifying the bulk 
of the GAs comes at the expense of having a large 
number of false positives. 

C. An alert system for GA 

We used LDA on three years of data to classify 
GA and nominal samples. A separate year of data 
was withheld to test the classifier's predicative 
capabilities. The training set consists of randomly 
selected nominal samples and all the GA from 2006 
to 2008. The test set contains all the available 
samples from 2009. Figure [14] presents the result of 
the classification and prediction. The green curve 
(top) is corresponds to the training set and the blue 
curve (bottom) to the test set. The horizontal-axis 
represents the fraction of time predicted as alert- 
level, that is the fraction of the time where GAs are 
likely to occur. The y-axis represents the proportion 
of actual GAs that occurred during a period of alert- 
level. The crosses on the green line correspond to 
the different values of the threshold T (Eq. fTJ. By 
fixing a value of T, one can choose the amount 
of samples in alert-level. The dashed black line 
indicate the increased probability of a GA to occur 
during alert-level compared to the remaining of the 
time. The line lx indicates complete randomness 
of the prediction. Then, when the green line is 
over the 4x line, a GA is 4 times more likely to 
occur than during the remaining of the time. For 
instance, the green line intercepts the 9x line at 



15% of the time in threat level. This means that 
the predictor can be in "threat level" 15% of the 
time and capture 39% of the GAs. During alert- 
level time, GAs are 9 times more likely to occur 
than during the remaining 85% of the time. To 
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Fig. 14. Classification of GA using LDA 

further analyze the results of the predictive system, 
we looked at the distribution over time of the 



samples predicted in alert level. Figure 15 presents 



the time distribution of the nominal samples, GA 
and samples identified in alert-level for 2009. In this 
example, the value of T was selected so that 15% of 
the samples are in threat level, capturing 39% of the 
GA. The alert-level curve follows the same pattern 
as the GA over time. The predictive system slightly 
over-estimate the risk during the morning peak and 
under-estimates the risks during the evening peak. 

V. Conclusion 

This paper investigated a number of airport oper- 
ational features, each of which is readily accessible 
to on-duty air traffic controllers, and the role fluc- 
tuations in them may play in precipitating a missed 
approach. We showed the important interconnec- 
tion between surface operations, airborne opera- 
tions, airports located in the vicinity of each other, 
air traffic control and delays. By analyzing how 
the distribution of these features varied between 
nominal and go-around operations, we provided a 
statistical mechanism to gain insight into which 
factors are more likely to be a discernible precursor 
to go-arounds. Interpretations for these results were 
provided in terms of the current operational policies 



Year 2009, Alert Level = 15%, GA = 38% 
0.16 | 1 1 1 




6 8 10 12 14 16 18 20 22 24 
Time of the day 



Fig. 15. Time distribution of nominal samples, GA and alert- 
level 

in place at busy metropolitan airports. Armed with 
the new insight afforded by these statistics, we 
proposed a framework for developing an automated 
alert system for high missed approach potential sit- 
uations. Although the machine learning techniques 
employed for this purpose would mandate the alert 
system be "on" for an large amount of time in order 
to capture most of the go-arounds. It appears that 
the prediction of go-arounds is a very challenging 
task and this study highlighted unexpected factors 
that have an impact on the probability of a missed 
approach. These factors such as airport coupling 
effect for controllers need to be accounted for in 
the future design of metroplexes and high-density 
operations. 
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